Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.3 - Pipeline now validated on five real genomes #88

Merged
merged 42 commits into from
Feb 9, 2024
Merged

Conversation

muffato
Copy link
Member

@muffato muffato commented Jan 9, 2024

I've now tested the pipeline on five real, complete, genomes (though all under 100 Mbp). This PR is to fix all the issues I've found along the way:

  • Fixed the conditional runs of blastn
  • Fixed the generation of the no-hit list
  • Fixed the conversion of the unaligned input files to Fasta
  • Fixed the documentation about preparing the NT database
  • Fixed the detection of the NT database in the nf-core module
  • The pipeline now supports samplesheets generated by the
    nf-core/fetchngs pipeline by passing the
    --fetchngs_samplesheet true option.
  • FastQ files can bypass the conversion to Fasta
  • Fixed missing BUSCO results from the blobdir (only 1 BUSCO was loaded)
  • Fixed the default category used to colour the blob plots

I want to release this as the version 0.3.
The following is still needed for the version 1.0:

  • Validation on >100 Mbp genomes, possibly with some improved resource requests.
  • Validation on the usage for non public assemblies.

which may require another 0.* release.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@muffato muffato self-assigned this Jan 9, 2024
Copy link

github-actions bot commented Jan 9, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit ff87d76

+| ✅ 134 tests passed       |+
#| ❔  23 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: assets/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/igenomes.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-blobtoolkit_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/blobtoolkit/blobtoolkit/.github/workflows/awstest.yml
  • template_strings - template_strings
  • merge_markers - merge_markers

✅ Tests passed:

Run details

  • nf-core/tools version 2.11
  • Run at 2024-02-09 11:28:25

muffato added 12 commits January 9, 2024 14:55
With the `-0` option, the output file ("interleaved") is empty for PacBio
because all the reads go to the "other" file. Whereas paired-reads all go to
the "interleaved" file and none to the "other" file.

The simplest is to not use the `-0` option. Then, `samtools fasta` simply sends
all the reads to the standard output.
This simplifies connecting to the fetchngs pipeline
Copy link

Python linting (black) is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks.
To fix this CI test, please run:

  • Install black: pip install black
  • Fix formatting errors in your pipeline: black .

Once you push these changes the test should pass, and you can hide this comment 👍

We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

@muffato muffato changed the title Fixes identified when moving the pipeline to production v0.3 - Pipeline now validated on five real genomes Jan 19, 2024
@muffato
Copy link
Member Author

muffato commented Jan 19, 2024

Ignore the prettier error for now. .devcontainer/devcontainer.json hasn't changed since the last pull-request, where it passed prettier. The CI must be picking up a new version of prettier which we don't have on the farm yet. Will be addressed later.

@muffato muffato marked this pull request as ready for review January 19, 2024 12:55
@muffato
Copy link
Member Author

muffato commented Jan 19, 2024

To run on the farm, use these arguments:

--taxdump /lustre/scratch123/tol/resources/taxonomy/latest/new_taxdump
--busco /lustre/scratch123/tol/resources/busco/latest
--blastp /lustre/scratch123/tol/resources/uniprot_reference_proteomes/latest/reference_proteomes.dmnd
--blastx /lustre/scratch123/tol/resources/uniprot_reference_proteomes/latest/reference_proteomes.dmnd
--blastn /lustre/scratch123/tol/teams/tolit/users/mm49/nextflow/btk/prod_test/new_nt/untar

(there isn't a central NT database with taxonomy information yet. It's still in progress)

@muffato
Copy link
Member Author

muffato commented Jan 22, 2024

Prettier error fixed, thanks to @BethYates

@BethYates
Copy link
Contributor

The code looks good but the images aren't being published to the results directory

@muffato
Copy link
Member Author

muffato commented Feb 9, 2024

@BethYates

  1. The images are now published in their directory
  2. Introduced an output directory named blobtoolkit and turned BUSCO to lower case.
  3. Images as PNG by default, but SVG can be generated with the option --image_format svg
  4. I had to pin the version of nf-core for the linting to pass

@muffato muffato merged commit 681362c into dev Feb 9, 2024
6 checks passed
@muffato muffato deleted the fixes_for_prod branch February 9, 2024 12:49
@muffato muffato mentioned this pull request Feb 9, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants